The following sections provide details for the analyses performed in:
DNA methylation and gene expression as determinants of genome-wide cell-free DNA fragmentation
Michaël Noë, Dimitrios Mathios, Akshaya V. Annapragada, Shashikant Koul, Zacharia H. Foda, Jamie Medina, Stephen Cristiano, Christopher Cherry, Daniel C. Bruhm, Noushin Niknafs, Vilmos Adleff, Leonardo Ferreira, Hari Easwaran, Stephen Baylin, Jillian Phallen, Robert B. Scharpf, and Victor E. Velculescu. DNA methylation and gene expression as determinants of genome-wide cell-free DNA fragmentation
Circulating cell-free DNA (cfDNA) is emerging as a diagnostic avenue for cancer detection, but the characteristics and origins of cfDNA fragmentation in the blood are poorly understood. We evaluated the effect of DNA methylation and gene expression on naturally occurring genome-wide cfDNA fragmentation through analysis of plasma from 969 individuals, including 182 with cancer. cfDNA fragment ends occurring at preferred locations genome-wide more frequently contained CCs or CGs, and fragments ending with CGs or CCGs were enriched or depleted, respectively, at methylated CpG positions, consistent with structural models showing increased interaction of methylated CG fragment ends with nucleosomes. Higher levels and larger sizes of cfDNA fragments were independently associated with regions of CpG methylation and reduced gene expression, and reflected differences in cfDNA fragmentation in tissue-specific pathways. The effects of methylation and expression on cfDNA coverage were validated by analyses of human cfDNA in mice implanted with isogenic tumors with or without the mutant IDH1 chromatin modifier. Tumor-related hypomethylation and increased gene expression were associated with global decrease in cfDNA fragment size that may explain the overall smaller cfDNA fragments observed in human cancers. Cancer-specific methylation at CpGs of pancreatic cancer patients was associated with genome-wide changes in cfDNA fragment ends in patients with cancers. These results provide a connection between epigenetic changes and cfDNA fragmentation that may have implications for disease detection.
The analyses explained in this file uses the data from previous studies (Christiano et al., Nature, 2019 and Mathios et al., Nature Communications, 2021). The data for the samples analyzed in these studies was deposited at the database of Genotypes and Phenotypes (dbGaP) and the European Genome-Phenome Archive (EGA). The GitHub-repositories for these studies explain how the data was analysed until a GenomicRanges (GRanges) object was made with the fragment chromosome, start- and end-positions (pre-processing in the ‘reproduce_lucas_wflow’-GitHub). These objects were saved as ‘.rds’ file. Unlike BED-files (not used here), GRanges objects contain the start-position of the fragment, while BED-files use the position before the start-position as the first base in the fragment.
For the analyses explained in this file, we will start from raw-data as saved in ‘.rds’-files containing GRanges-objects. In order to construct the plots in the paper, we often generate temporary files which are too big or too numerous to upload to the GitHub repository. These files will be store in a folder on the same level as the folder containing the repository (cfepigenetics), called ‘cfepigenetics_data’. The temporary files stored there will be used as input for a summarizing script that will generate a final summary-file, which is small enough to store into the data-folder in this GitHub repository.
The data used in this study has been made publically available when published with previous studies.
The methylation data of cell-free DNA used in this study was published before.
The gene expression data of white-blood cells (myeloid) used in this study was made publically available.
Gene expression and DNA methylation data of other tumors and healthy white blood cells (Supplementary Figure 15) was downloaded from the website of the The Cancer Genome Atlas (TCGA).
In order to go from the raw sequencing-data, as presented in the ‘fastq’-files, towards the GRanges-objects, containing information about cell-free DNA fragment positions (as defined by ‘chromosome’, ‘start’ and ‘end’), we refer to the GitHub-repository from the paper: Mathios et al., Nature Communications, 2021. In the code-folder, there is a pre-processing-folder, containg the scripts to pre-process the ‘fastq’-files.
This figure is a variation on Figure 2A, using different beta-value cut-offs to define ‘methylated’ and ‘unmethylated’. * Pre_Figure2.rmd: contains a step-by-step guide which scripts will process the raw data (GRanges-objects; per sample) to intermediary files (per sample) and summarize them (all samples) into a summary-file, uploaded to this repository (data). This script requires the raw data (after pre-processing the data from Cristiano et al. and Mathios et al.). * Supplementary_Figure2.rmd: process summarized file and generate Supplementary Figure 2.
Supplementary_Figure15.rmd: process summarized file and generate Supplementary Figure 15.
pander::pander(sessionInfo())
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
locale: en_US.UTF-8||en_US.UTF-8||en_US.UTF-8||C||en_US.UTF-8||en_US.UTF-8
attached base packages: stats, graphics, grDevices, utils, datasets, methods and base
loaded via a namespace (and not attached): digest(v.0.6.33), R6(v.2.5.1), fastmap(v.1.1.1), xfun(v.0.41), cachem(v.1.0.8), knitr(v.1.45), htmltools(v.0.5.7), rmarkdown(v.2.25), lifecycle(v.1.0.4), cli(v.3.6.1), pander(v.0.6.5), sass(v.0.4.7), jquerylib(v.0.1.4), compiler(v.4.3.2), rstudioapi(v.0.15.0), tools(v.4.3.2), evaluate(v.0.23), bslib(v.0.6.0), Rcpp(v.1.0.11), yaml(v.2.3.7), rlang(v.1.1.2) and jsonlite(v.1.8.7)